This worksheet covers concepts covered in the first half of Module 1 - Exploratory Data Analysis in One Dimension. It should take no more than 20-30 minutes to complete. Please raise your hand if you get stuck.
There are many ways to accomplish the tasks that you are presented with, however you will find that by using the techniques covered in class, the exercises should be relatively simple.
For this exercise, we will be using:
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
In this exercise, you are given a list of email addresses called emails. Your goal is to find the email accounts from domains that end in .edu. To accomplish this, you will need to:
If you get stuck, refer to the documentation for Pandas string manipulation (http://pandas.pydata.org/pandas-docs/stable/text.html) or the slides. Note that there are various functions to accomplish this task.
In [2]:
emails = ['alawrence0@prlog.org',
'blynch1@businessweek.com',
'mdixon2@cmu.edu',
'rvasquez3@1688.com',
'astone4@creativecommons.org',
'mcarter5@chicagotribune.com',
'dcole6@vinaora.com',
'kpeterson7@topsy.com',
'ewebb8@cnet.com',
'jtaylor9@google.ru',
'ecarra@buzzfeed.com',
'jjonesb@arizona.edu',
'jbowmanc@disqus.com',
'eduardo_sanchezd@npr.org',
'emooree@prweb.com',
'eberryf@brandeis.edu',
'sgardnerh@wikipedia.org',
'balvarezi@delicious.com',
'blewisj@privacy.gov.au']
In [3]:
email_series = pd.Series(emails)
filtered_emails = email_series[email_series.str.contains('.edu')]
print( filtered_emails )
In [4]:
accounts = filtered_emails.str.split( '@').str[0]
print( accounts )
In [5]:
poundsToKilograms = lambda x: x * 0.45359237
weights = [31.09, 46.48, 24.0, 39.99, 19.33, 39.61, 40.91, 52.24, 30.77, 17.23, 34.87 ]
In [6]:
pounds = pd.Series( weights )
kilos = pounds.apply( poundsToKilograms )
print( kilos )
You are given a Series of IP Addresses and the goal is to limit this data to private IP addresses. Python has an ipaddress module which provides the capability to create, manipulate and operate on IPv4 and IPv6 addresses and networks. Complete documentation is available here: https://docs.python.org/3/library/ipaddress.html.
Here are some examples of how you might use this module:
import ipaddress
myIP = ipaddress.ip_address( '192.168.0.1' )
myNetwork = ipaddress.ip_network( '192.168.0.0/28' )
#Check membership in network
if myIP in myNetwork: #This works
print "Yay!"
#Loop through CIDR blocks
for ip in myNetwork:
print( ip )
192.168.0.0
192.168.0.1
…
…
192.168.0.13
192.168.0.14
192.168.0.15
#Testing to see if an IP is private
if myIP.is_private:
print( "This IP is private" )
else:
print( "Routable IP" )
ipaddress module.
In [2]:
hosts = [ '192.168.1.2', '10.10.10.2', '172.143.23.34', '34.34.35.34', '172.15.0.1', '172.17.0.1']
In [4]:
from ipaddress import ip_address
IPData = pd.Series( hosts )
privateIPs = IPData[IPData.apply( lambda x : ip_address(x).is_private ) ]
print( privateIPs )
In [9]:
def is_private(ip):
return ip_address(ip).is_private
In [11]:
IPData[ IPData.apply(is_private) ]
Out[11]:
In [ ]: